Basic SQLAlchemy

What is SQL?

SQL (Structured Query Language, pronounced both as S-Q-L and Sequel) is a programming language designed to interface with data held in a relational database. These are databases within which you can hold different kinds of entries and define relationships between them. For example, you can define many Star entries and many Observation entries and then define a relationship in that each Observation is of a specific Star or of multiple Stars.

This can be exceptionally useful in working with large sets of data. Each entry in your database has a set of columns, which can be strings, floats, ints, or booleans (though SQLAlchemy can perform some tricks to expand this list). The power of using databases comes from the ability to query your data according to these columns. For example, the Star entry can have an RA column and a DEC column, and so, you could query for a list of Stars in a particular part of the sky.

But this is thinking too small. With a relational database, you could get a list of all the stars in your database with at least three observations with good seeing above a particular magnitude in only a few lines of code. And now, with SQLAlchemy, you can do it all in Python.

What is SQLAlchemy?

SQLAlchemy is a Python module that acts as an interface between Python code and a SQL database. Effectively, you can write object classes in Python whose instances are rows in a particular table in your database. This lets you manage your database straight from Python, giving you access to a full programming language of possibilities. I'll offer some practical examples in the Practical Examples notebook, but here's an example I've done in my research: you can generate plots directly from the database using an arbitrary query, and can use the relationships between database entries to make interactive plots.

For more details on SQLAlchemy, check out their website http://www.sqlalchemy.org. In particular, you should walk through their tutorial (of which I'll be giving a taste here).

A final note: there are other tools to do this in Python, and another that I've used is Django. I often find that Django has a much better user interface than SQLAlchemy; however, it isn't really designed to be used just for its databasing. Django is a web framework, and extracting the database part of the framework looks like it would be a bit tricky. If your goal is to write a database for a website, though, I'd definitely recommend Django over SQLAlchemy.

Getting Started

Instructions: Sentences in bold are instructions. These notebooks will not work unless the instructions are completed. Another note: if you find that SQLAlchemy is complaining that you are modifying an existing table, you should restart the notebook (the circular arrow in the toolbar above) and execute all the cells down from the top. This is a quirk of SQLAlchemy, and one that hopefully will make sense with some experience.

To get started using SQLAlchemy, we have to create an Engine. This is the interface between SQLAlchemy and the actual database, and this step points SQLAlchemy to the database of interest.

For the moment, to avoid polluting your computer, we'll create a temporary database in memory as shown below. To create a real database, the first argument to create_engine should be 'dbtype://path/file_name'. The echo argument is for debugging purposes. We'll turn it on now for you to see a bit of what's going on behind the scenes, but feel free not to set it.


In [2]:
from sqlalchemy import create_engine

engine = create_engine ('sqlite:///:memory:', echo = True)

Now that we have the interface to our database, we need to define a table to put into the database. SQLAlchemy allows you two ways to do this: the declarative way and the classical way. For the most part, the declarative method is more "Pythonic," so I'm going to focus on that one. Let's make a Star entry. To do this, we need to load the declarative method.


In [3]:
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

This generates a base class from which all of our entries will inherit. Effectively, this base class sets up all the methods that our entry object will need to work as part of a database. Try adding ra and dec float columns.


In [4]:
from sqlalchemy import Column, Integer, String, Float

class Star (Base):    
    __tablename__ = 'stars' # This is the name of the table in the database. Must be unique
    
# After the table's name, we add the desired columns. These contain the queryable information about
#    the current entry.

    id = Column (Integer, primary_key = True) # This is the primary key of the entry. 
# The primary key is the unique identifier of the entry. It's usually best to leave it as an int.

    name = Column (String, unique = True) # The unique argument enforces uniqueness for the column


    def __repr__ (self):
        # A string representation of the object
        # We can call the various columns with self.column_name, as with any normal class variable
        return "<Star Object %s at (RA=%f, DEC=%f)>" % (self.name, self.ra, self.dec)
    
#     CheckConstraint('col2 > col3 + 5', name='check1')

Once the class has been declared, we need to create it in the database, i.e. we need to connect the Engine class to our new Star class. We can do this through our Base class, which will add any subclasses it can find as tables in our database. If you set echo = True in the creation of your engine, this will be a bit verbose.


In [8]:
Base.metadata.create_all (engine)


2015-01-26 11:24:48,736 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
INFO:sqlalchemy.engine.base.Engine:SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2015-01-26 11:24:48,737 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
2015-01-26 11:24:48,739 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
INFO:sqlalchemy.engine.base.Engine:SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2015-01-26 11:24:48,739 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
2015-01-26 11:24:48,742 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("stars")
INFO:sqlalchemy.engine.base.Engine:PRAGMA table_info("stars")
2015-01-26 11:24:48,743 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
2015-01-26 11:24:48,745 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE stars (
	id INTEGER NOT NULL, 
	name VARCHAR, 
	PRIMARY KEY (id), 
	UNIQUE (name)
)


INFO:sqlalchemy.engine.base.Engine:
CREATE TABLE stars (
	id INTEGER NOT NULL, 
	name VARCHAR, 
	PRIMARY KEY (id), 
	UNIQUE (name)
)


2015-01-26 11:24:48,746 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
2015-01-26 11:24:48,748 INFO sqlalchemy.engine.base.Engine COMMIT
INFO:sqlalchemy.engine.base.Engine:COMMIT

Congratulations! You've made a database! Of course, making it and using it are two different things. To interface with the database, you need a Session object. This object serves as your interface to SQLAlchemy. With it, you can create, delete, modify, and query entries. The syntax follows:


In [9]:
from sqlalchemy.orm import sessionmaker

Session = sessionmaker (bind = engine)

Note that Session is a class, not a class instance. When you want an instance of a session, you will need to do so explicitly (as I'll show in a moment).

To give an idea of how to use this, all of the above code would exist as a Python module that defines your database and gets it ready for use. When you need to query the database, you import this module and run, for example:


In [7]:
session = Session () # Start the SQLAlchemy session

newStar = Star (name = "Polaris", ra = 2.5303, dec = 89.2641) # Make a Star to put into the database

session.add (newStar) # Add newStar to the database

We've almost added a new object to the database. SQLAlchemy wants to be very careful to prevent you from making mistakes. To commit the new objects to the database, use


In [8]:
session.commit () # Commit the changes to the database


2015-01-23 20:16:14,706 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
2015-01-23 20:16:14,708 INFO sqlalchemy.engine.base.Engine INSERT INTO stars (name, ra, dec) VALUES (?, ?, ?)
INFO:sqlalchemy.engine.base.Engine:INSERT INTO stars (name, ra, dec) VALUES (?, ?, ?)
2015-01-23 20:16:14,708 INFO sqlalchemy.engine.base.Engine ('Polaris', 2.5303, 89.2641)
INFO:sqlalchemy.engine.base.Engine:('Polaris', 2.5303, 89.2641)
2015-01-23 20:16:14,710 INFO sqlalchemy.engine.base.Engine COMMIT
INFO:sqlalchemy.engine.base.Engine:COMMIT

And that's it! You've successfully added Polaris to the database. Try adding a few more; these will be useful later. You can also try modifying the properties of newStar, but don't commit your changes just yet.


In [8]:

We can always check whether SQLAlchemy has anything to commit with session.new and session.dirty. Note that we don't have to add newStar into the database again, the session's already tracking it.


In [9]:
session.new


Out[9]:
IdentitySet([])

In [10]:
session.dirty


Out[10]:
IdentitySet([])

In [11]:
session.commit ()

To delete an object from the database, just use session.delete.


In [12]:
session.delete (newStar)

Finally, say we didn't want to commit that last command. Just use the rollback method of session.


In [13]:
session.rollback ()

That's all there is to adding and deleting entries with SQLAlchemy! But of course, we haven't actually checked that the things we've added are in the database. This brings us to our next point: querying the database.

Querying the Database

The power of databases comes from your ability to query them. Effectively, you need to tell SQLAlchemy what to look for in the database. An interesting note here: when you call a query, this returns a Query object, not the results of that query. You'll see what I mean in a minute.

To query the database, use the query method of the session object. This takes as arguments the types of objects you'd like to query. In this case, we're querying the Star object.


In [14]:
session.query (Star)


Out[14]:
<sqlalchemy.orm.query.Query at 0x1053eb358>

As I said, this returns a Query object. The power of this is that SQLAlchemy won't actually look through the database until it actually needs to. This can be very useful when you start to use filters, but for the moment, let's figure out how to retrieve objects from the database:


In [15]:
query = session.query (Star)

query.all () # Outputs a list of everything that matches the query


2015-01-23 20:16:21,577 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
2015-01-23 20:16:21,578 INFO sqlalchemy.engine.base.Engine SELECT stars.id AS stars_id, stars.name AS stars_name, stars.ra AS stars_ra, stars.dec AS stars_dec 
FROM stars
INFO:sqlalchemy.engine.base.Engine:SELECT stars.id AS stars_id, stars.name AS stars_name, stars.ra AS stars_ra, stars.dec AS stars_dec 
FROM stars
2015-01-23 20:16:21,579 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
Out[15]:
[<Star Object Polaris at (RA=2.530300, DEC=89.264100)>]

In [16]:
query.order_by (Star.ra).first () # Outputs the first entry found, should be used with order_by


2015-01-23 20:16:22,057 INFO sqlalchemy.engine.base.Engine SELECT stars.id AS stars_id, stars.name AS stars_name, stars.ra AS stars_ra, stars.dec AS stars_dec 
FROM stars ORDER BY stars.ra
 LIMIT ? OFFSET ?
INFO:sqlalchemy.engine.base.Engine:SELECT stars.id AS stars_id, stars.name AS stars_name, stars.ra AS stars_ra, stars.dec AS stars_dec 
FROM stars ORDER BY stars.ra
 LIMIT ? OFFSET ?
2015-01-23 20:16:22,058 INFO sqlalchemy.engine.base.Engine (1, 0)
INFO:sqlalchemy.engine.base.Engine:(1, 0)
Out[16]:
<Star Object Polaris at (RA=2.530300, DEC=89.264100)>

In [21]:
query.count () # Outputs the number of matches


2015-01-14 19:06:15,157 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1 
FROM (SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars) AS anon_1
INFO:sqlalchemy.engine.base.Engine:SELECT count(*) AS count_1 
FROM (SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars) AS anon_1
2015-01-14 19:06:15,157 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
Out[21]:
2

In [23]:
query.one () # If there's only one match, output it; otherwise, raise an error


2015-01-14 19:07:20,575 INFO sqlalchemy.engine.base.Engine SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars
INFO:sqlalchemy.engine.base.Engine:SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars
2015-01-14 19:07:20,576 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
---------------------------------------------------------------------------
MultipleResultsFound                      Traceback (most recent call last)
<ipython-input-23-91f992e02acb> in <module>()
----> 1 query.one () # If there's only one match, output it; otherwise, raise an error

/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py in one(self)
   2376         else:
   2377             raise orm_exc.MultipleResultsFound(
-> 2378                 "Multiple rows were found for one()")
   2379 
   2380     def scalar(self):

MultipleResultsFound: Multiple rows were found for one()

These queries can be filtered and ordered as well.


In [25]:
query.filter (Star.ra < 5).all ()


2015-01-14 19:09:16,110 INFO sqlalchemy.engine.base.Engine SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars 
WHERE stars.ra < ?
INFO:sqlalchemy.engine.base.Engine:SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars 
WHERE stars.ra < ?
2015-01-14 19:09:16,110 INFO sqlalchemy.engine.base.Engine (5,)
INFO:sqlalchemy.engine.base.Engine:(5,)
Out[25]:
[<Star Object Alpha Ursae Minoris at (RA=2.530300, DEC=89.264100)>]

In [28]:
query.order_by (Star.dec).all ()


2015-01-14 19:10:07,986 INFO sqlalchemy.engine.base.Engine SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars ORDER BY stars.dec
INFO:sqlalchemy.engine.base.Engine:SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars ORDER BY stars.dec
2015-01-14 19:10:07,987 INFO sqlalchemy.engine.base.Engine ()
INFO:sqlalchemy.engine.base.Engine:()
Out[28]:
[<Star Object Alpha Orionis at (RA=5.920000, DEC=7.407100)>,
 <Star Object Alpha Ursae Minoris at (RA=2.530300, DEC=89.264100)>]

But remember, this is Python. Star.ra and Star.dec are Column objects, and the SQLAlchemy team have added algebraic operations to these objects, so you can filter and order by combinations of columns.


In [30]:
query.filter (Star.ra - Star.dec / 2 < 5).all ()


2015-01-14 19:13:44,505 INFO sqlalchemy.engine.base.Engine SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars 
WHERE stars.ra - stars.dec / ? < ?
INFO:sqlalchemy.engine.base.Engine:SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars 
WHERE stars.ra - stars.dec / ? < ?
2015-01-14 19:13:44,506 INFO sqlalchemy.engine.base.Engine (2, 5)
INFO:sqlalchemy.engine.base.Engine:(2, 5)
Out[30]:
[<Star Object Alpha Ursae Minoris at (RA=2.530300, DEC=89.264100)>,
 <Star Object Alpha Orionis at (RA=5.920000, DEC=7.407100)>]

The filter method returns another Query object, so you can continue to add filters and orderings.


In [37]:
query.filter (Star.ra < 5).filter (Star.dec > 0).all ()


2015-01-14 19:25:31,117 INFO sqlalchemy.engine.base.Engine SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars 
WHERE stars.ra < ? AND stars.dec > ?
INFO:sqlalchemy.engine.base.Engine:SELECT stars.id AS stars_id, stars.name AS stars_name, stars.dec AS stars_dec, stars.ra AS stars_ra 
FROM stars 
WHERE stars.ra < ? AND stars.dec > ?
2015-01-14 19:25:31,118 INFO sqlalchemy.engine.base.Engine (5, 0)
INFO:sqlalchemy.engine.base.Engine:(5, 0)
Out[37]:
[<Star Object Alpha Ursae Minoris at (RA=2.530300, DEC=89.264100)>]

Feel free to play around. You might want to add some more entries to the database so that you can truly see what you can do. In what order do things pop out when you order by one column and then another?


In [37]:

You now have a good amount of experience with adding entries to a table that already exists and with querying that database using filters. For more information on this, check out the documentation (or just search in StackOverflow). It will generally help if you can figure out the terminology for what you're trying to do in SQL (e.g. union, subquery), as that will help narrow your search.

If you're bored of normal, non-relational databases, follow the link to the next notebook! If you want to learn how to add constraints to your objects, keep reading.

Adding Constraints

Another convenient aspect of databases is the ability to add constraints. These will effectively set certain conditions on anything added to the database and raise an error if anything that violates these constraints is added to the database. You've already seen one of these constraints, the unique constraint, which forces a particular column to be unique in the database. However, we can take this a step further. What if we want two columns to be unique together? For example, what if we want to make sure that no two stars in our database have the exact same RA and the exact same Dec? We do this by explicitly adding a constraint.


In [3]:
from sqlalchemy import UniqueConstraint
from sqlalchemy import Column, Integer, String, Float

class Star (Base):    
    __tablename__ = 'stars' # This is the name of the table in the database. Must be unique
    
# After the table's name, we add the desired columns. These contain the queryable information about
#    the current entry.

    id = Column (Integer, primary_key = True) # This is the primary key of the entry. 
# The primary key is the unique identifier of the entry. It's usually best to leave it as an int.

    name = Column (String, unique = True) # The unique argument enforces uniqueness for the column
    
    ra = Column (Float)
    dec = Column (Float)
    
    __table_args__ = (UniqueConstraint ("ra", "dec"),)
    
    def __repr__ (self):
        # A string representation of the object
        # We can call the various columns with self.column_name, as with any normal class variable
        return "<Star Object %s at (RA=%f, DEC=%f)>" % (self.name, self.ra, self.dec)

Paste this new definition of Star up to the original definition and restart the notebook (each time you change the table in the database, you'll most likely want to nuke the database and bring it back from scratch; there are ways to avoid this, called migrations, but these are quite complicated and usually not worth it except in production level databases). Try adding two stars with different names but the same RA and Dec and see if you get an error.


In [ ]:

The new piece we've added is the __table_args__ member of the object. This is a special object that SQLAlchemy will read to add special commands to your database. Among other things, constraints are placed here. This member should be a tuple of all the constraints you want to add to your object. Let's look at another type of constraint, the check constraint. This makes sure that certain conditions are met and has the following notation:


In [8]:
from sqlalchemy import CheckConstraint

CheckConstraint ("ra < 24")


Out[8]:
CheckConstraint(<sqlalchemy.sql.elements.TextClause object at 0x10cb86278>)

Try adding this constraint in addition to a minimum for RA and likewise for Dec to your Star.


In [ ]: